17 research outputs found
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs
Graph neural networks (GNNs), as the de-facto model class for representation
learning on graphs, are built upon the multi-layer perceptrons (MLP)
architecture with additional message passing layers to allow features to flow
across nodes. While conventional wisdom commonly attributes the success of GNNs
to their advanced expressivity, we conjecture that this is not the main cause
of GNNs' superiority in node-level prediction tasks. This paper pinpoints the
major source of GNNs' performance gain to their intrinsic generalization
capability, by introducing an intermediate model class dubbed as
P(ropagational)MLP, which is identical to standard MLP in training, but then
adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs
consistently perform on par with (or even exceed) their GNN counterparts, while
being much more efficient in training. This finding sheds new insights into
understanding the learning behavior of GNNs, and can be used as an analytic
tool for dissecting various GNN-related research problems. As an initial step
to analyze the inherent generalizability of GNNs, we show the essential
difference between MLP and PMLP at infinite-width limit lies in the NTK feature
map in the post-training stage. Moreover, by examining their extrapolation
behavior, we find that though many GNNs and their PMLP counterparts cannot
extrapolate non-linear functions for extremely out-of-distribution samples,
they have greater potential to generalize to testing samples near the training
data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML
Advective Diffusion Transformers for Topological Generalization in Graph Learning
Graph diffusion equations are intimately related to graph neural networks
(GNNs) and have recently attracted attention as a principled framework for
analyzing GNN dynamics, formalizing their expressive power, and justifying
architectural choices. One key open questions in graph learning is the
generalization capabilities of GNNs. A major limitation of current approaches
hinges on the assumption that the graph topologies in the training and test
sets come from the same distribution. In this paper, we make steps towards
understanding the generalization of GNNs by exploring how graph diffusion
equations extrapolate and generalize in the presence of varying graph
topologies. We first show deficiencies in the generalization capability of
existing models built upon local diffusion on graphs, stemming from the
exponential sensitivity to topology variation. Our subsequent analysis reveals
the promise of non-local diffusion, which advocates for feature propagation
over fully-connected latent graphs, under the assumption of a specific
data-generating condition. In addition to these findings, we propose a novel
graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by
advective graph diffusion equations that have a closed-form solution backed up
with theoretical guarantees of desired generalization under topological
distribution shifts. The new model, functioning as a versatile graph
Transformer, demonstrates superior performance across a wide range of graph
learning tasks.Comment: 39 page
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
Real-world data generation often involves complex inter-dependencies among
instances, violating the IID-data hypothesis of standard learning paradigms and
posing a challenge for uncovering the geometric structures for learning desired
instance representations. To this end, we introduce an energy constrained
diffusion model which encodes a batch of instances from a dataset into
evolutionary states that progressively incorporate other instances' information
by their interactions. The diffusion process is constrained by descent criteria
w.r.t.~a principled energy function that characterizes the global consistency
of instance representations over latent structures. We provide rigorous theory
that implies closed-form optimal estimates for the pairwise diffusion strength
among arbitrary instance pairs, which gives rise to a new class of neural
encoders, dubbed as DIFFormer (diffusion-based Transformers), with two
instantiations: a simple version with linear complexity for prohibitive
instance numbers, and an advanced version for learning complex structures.
Experiments highlight the wide applicability of our model as a general-purpose
encoder backbone with superior performance in various tasks, such as node
classification on large graphs, semi-supervised image/text classification, and
spatial-temporal dynamics prediction.Comment: Accepted by International Conference on Learning Representations
(ICLR 2023
Localized Contrastive Learning on Graphs
Contrastive learning methods based on InfoNCE loss are popular in node
representation learning tasks on graph-structured data. However, its reliance
on data augmentation and its quadratic computational complexity might lead to
inconsistency and inefficiency problems. To mitigate these limitations, in this
paper, we introduce a simple yet effective contrastive model named Localized
Graph Contrastive Learning (Local-GCL in short). Local-GCL consists of two key
designs: 1) We fabricate the positive examples for each node directly using its
first-order neighbors, which frees our method from the reliance on
carefully-designed graph augmentations; 2) To improve the efficiency of
contrastive learning on graphs, we devise a kernelized contrastive loss, which
could be approximately computed in linear time and space complexity with
respect to the graph size. We provide theoretical analysis to justify the
effectiveness and rationality of the proposed methods. Experiments on various
datasets with different scales and properties demonstrate that in spite of its
simplicity, Local-GCL achieves quite competitive performance in self-supervised
node representation learning tasks on graphs with various scales and
properties
The mechanisms of Yu Ping Feng San in tracking the cisplatin-resistance by regulating ATP-binding cassette transporter and glutathione S-transferase in lung cancer cells
Cisplatin is one of the first line anti-cancer drugs prescribed for treatment of solid tumors; however, the chemotherapeutic drug resistance is still a major obstacle of cisplatin in treating cancers. Yu Ping Feng San (YPFS), a well-known ancient Chinese herbal combination formula consisting of Astragali Radix, Atractylodis Macrocephalae Rhizoma and Saposhnikoviae Radix, is prescribed as a herbal decoction to treat immune disorders in clinic. To understand the fast-onset action of YPFS as an anti-cancer drug to fight against the drug resistance of cisplatin, we provided detailed analyses of intracellular cisplatin accumulation, cell viability, and expressions and activities of ATP-binding cassette transporters and glutathione S-transferases (GSTs) in YPFS-treated lung cancer cell lines. In cultured A549 or its cisplatin-resistance A549/DDP cells, application of YPFS increased accumulation of intracellular cisplatin, resulting in lower cell viability. In parallel, the activities and expressions of ATP-binding cassette transporters and GSTs were down-regulated in the presence of YPFS. The expression of p65 subunit of NF-κB complex was reduced by treating the cultures with YPFS, leading to a high ratio of Bax/Bcl-2, i.e. increasing the rate of cell death. Prim-O-glucosylcimifugin, one of the abundant ingredients in YPFS, modulated the activity of GSTs, and then elevated cisplatin accumulation, resulting in increased cell apoptosis. The present result supports the notion of YPFS in reversing drug resistance of cisplatin in lung cancer cells by elevating of intracellular cisplatin, and the underlying mechanism may be down regulating the activities and expressions of ATP-binding cassette transporters and GSTs
Handling Distribution Shifts on Graphs: An Invariance Perspective
There is increasing evidence suggesting neural networks' sensitivity to
distribution shifts, so that research on out-of-distribution (OOD)
generalization comes into the spotlight. Nonetheless, current endeavors mostly
focus on Euclidean data, and its formulation for graph-structured data is not
clear and remains under-explored, given two-fold fundamental challenges: 1) the
inter-connection among nodes in one graph, which induces non-IID generation of
data points even under the same environment, and 2) the structural information
in the input graph, which is also informative for prediction. In this paper, we
formulate the OOD problem on graphs and develop a new invariant learning
approach, Explore-to-Extrapolate Risk Minimization (EERM), that facilitates
graph neural networks to leverage invariance principles for prediction. EERM
resorts to multiple context explorers (specified as graph structure editers in
our case) that are adversarially trained to maximize the variance of risks from
multiple virtual environments. Such a design enables the model to extrapolate
from a single observed environment which is the common case for node-level
prediction. We prove the validity of our method by theoretically showing its
guarantee of a valid OOD solution and further demonstrate its power on various
real-world datasets for handling distribution shifts from artificial spurious
features, cross-domain transfers and dynamic graph evolution.Comment: ICLR2022, 30 page
Multisensory information facilitates the categorization of untrained stimuli
Although it has been demonstrated that multisensory information can facilitate object recognition and object memory, it remains unclear whether such facilitation effect exists in category learning. To address this issue, comparable car images and sounds were first selected by a discrimination task in Experiment 1. Then, those selected images and sounds were utilized in a prototype category learning task in Experiments 2 and 3, in which participants were trained with auditory, visual, and audiovisual stimuli, and were tested with trained or untrained stimuli within the same categories presented alone or accompanied with a congruent or incongruent stimulus in the other modality. In Experiment 2, when low-distortion stimuli (more similar to the prototypes) were trained, there was higher accuracy for audiovisual trials than visual trials, but no significant difference between audiovisual and auditory trials. During testing, accuracy was significantly higher for congruent trials than unisensory or incongruent trials, and the congruency effect was larger for untrained high-distortion stimuli than trained low-distortion stimuli. In Experiment 3, when high-distortion stimuli (less similar to the prototypes) were trained, there was higher accuracy for audiovisual trials than visual or auditory trials, and the congruency effect was larger for trained high-distortion stimuli than untrained low-distortion stimuli during testing. These findings demonstrated that higher degree of stimuli distortion resulted in more robust multisensory effect, and the categorization of not only trained but also untrained stimuli in one modality could be influenced by an accompanying stimulus in the other modality.</p
Trading Hard Negatives and True Negatives: A Debiased Contrastive Collaborative Filtering Approach
Collaborative filtering (CF), as a standard method for recommendation with
implicit feedback, tackles a semi-supervised learning problem where most
interaction data are unobserved. Such a nature makes existing approaches highly
rely on mining negatives for providing correct training signals. However,
mining proper negatives is not a free lunch, encountering with a tricky
trade-off between mining informative hard negatives and avoiding false ones. We
devise a new approach named as Hardness-Aware Debiased Contrastive
Collaborative Filtering (HDCCF) to resolve the dilemma. It could sufficiently
explore hard negatives from two-fold aspects: 1) adaptively sharpening the
gradients of harder instances through a set-wise objective, and 2) implicitly
leveraging item/user frequency information with a new sampling strategy. To
circumvent false negatives, we develop a principled approach to improve the
reliability of negative instances and prove that the objective is an unbiased
estimation of sampling from the true negative distribution. Extensive
experiments demonstrate the superiority of the proposed model over existing CF
models and hard negative mining methods.Comment: in IJCAI 202